Using Mapreduce to Scale Events Correlation Discovery for Business Processes Mining
نویسندگان
چکیده
Using Mapreduce to scale events correlation discovery for business processes mining Hicham Reguieg, Farouk Toumani, Hamid Reza Motahari Nezhad, Boualem Benatallah HP Laboratories HPL-2012-170 business processes; Event Correlation; map reduce The volume of data related to business process execution is increasing significantly in the enterprise. Many of data sources include events related to the execution of the same processes in various systems or applications. Event correlation is the task of analyzing a repository of event logs in order to find out the set of events that belong to the same business process execution instance. This is a key step in the discovery of business processes from event execution logs. Event correlation is a computationally-intensive task in the sense that it requires a deep analysis of very large and growing repositories of event logs, and exploration of various possible relationships among the events. In this paper, we present a scalable data analysis technique to support efficient event correlation for mining business processes. We propose a two-stages approach to compute correlation conditions and their entailed process instances from event logs using MapReduce framework. The experimental results show that the algorithm scales well to large datasets. External Posting Date: August 7, 2012 [Fulltext] Approved for External Publication Internal Posting Date: August 7, 2012 [Fulltext] Copyright 2012 Hewlett-Packard Development Company, L.P. Using Mapreduce to scale events correlation discovery for business processes mining H. Reguieg, F. Toumani, H.R. Motahari-Nezhad and B. Benatallah 1 LIMOS, CNRS, Blaise Pascal University, Clermont-Ferrand, France {reguieg,ftoumani}@isima.fr 2 CSE, UNSW, Sydney, Australia [email protected] 3 HP Labs, Palo Alto, USA [email protected] Abstract. The volume of data related to business process execution is increasing significantly in the enterprise. Many of data sources include events related to the execution of the same processes in various systems or applications. Event correlation is the task of analyzing a repository of event logs in order to find out the set of events that belong to the same business process execution instance. This is a key step in the discovery of business processes from event execution logs. Event correlation is a computationally-intensive task in the sense that it requires a deep analysis of very large and growing repositories of event logs, and exploration of various possible relationships among the events. In this paper, we present a scalable data analysis technique to support efficient event correlation for mining business processes. We propose a two-stages approach to compute correlation conditions and their entailed process instances from event logs using MapReduce framework. The experimental results show that the algorithm scales well to large datasets. The volume of data related to business process execution is increasing significantly in the enterprise. Many of data sources include events related to the execution of the same processes in various systems or applications. Event correlation is the task of analyzing a repository of event logs in order to find out the set of events that belong to the same business process execution instance. This is a key step in the discovery of business processes from event execution logs. Event correlation is a computationally-intensive task in the sense that it requires a deep analysis of very large and growing repositories of event logs, and exploration of various possible relationships among the events. In this paper, we present a scalable data analysis technique to support efficient event correlation for mining business processes. We propose a two-stages approach to compute correlation conditions and their entailed process instances from event logs using MapReduce framework. The experimental results show that the algorithm scales well to large datasets.
منابع مشابه
Concept drift detection in business process logs using deep learning
Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...
متن کاملA decentralized approach for mining event correlations in distributed system monitoring
Nowadays, there is an increasing demand tomonitor, analyze, and control large scale distributed systems. Events detected during monitoring are temporally correlated, which is helpful to resource allocation, job scheduling, and failure prediction. To discover the correlations among detected events, many existing approaches concentrate detected events into an event database and perform data minin...
متن کاملA Survey on Parallel Method for Rough Set using MapReduce Technique for Data Mining
In this paper Present survey on Data mining, Data mining using Rough set Theory and Data Mining using parallel method for rough set Approximation with MapReduce Technique. With the development of Information technology data growing at a tremendous rate, so big data mining and knowledge discovery become a new challenge. Rough set theory has been successfully applied in data mining by using MapRe...
متن کاملOptimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce
Despite crucial recent advances, the problem of frequent itemset mining is still facing major challenges. This is particularly the case when: i) the mining process must be massively distributed and; ii) the minimum support (MinSup) is very low. In this paper, we study the effectiveness and leverage of specific data placement strategies for improving parallel frequent itemset mining (PFIM) perfo...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012